helpful feedback
We would like to thank the reviewers for their helpful feedback, and for their positive comments regarding the originality
We will clarify this formulation in the paper. Conjugate MDPs are an interesting framework for learning useful abstractions. Thank you for your feedback - we will improve the legibility of all figures with attention to B&W . "Ours" in Figure 2 is indeed using We are indeed using the best τ found for GAE when running with λ = 0 . We will clarify this also.
We thank reviewers R1, R2, R3, and R4 for their constructive and helpful feedback
We thank reviewers R1, R2, R3, and R4 for their constructive and helpful feedback. We aim to explore these ideas in future work. Our work is meant to complement these previous studies. We modified the Related work section to discuss and point out how these proposals complement our work. " This is not a paper searching for state of the art results, and it should not be We added these results to the revised manuscript.
NeurIPS 2019: Pseudo-Extended Markov chain Monte Carlo (paper ID: 2415) 1 We would like to thank the reviewers for dedicating their time to review our paper and the helpful feedback they have
All of the reviewers' minor comments and corrections have been added to Below, we address the reviewers' main questions. The paper focuses on HMC sampling. Unfortunately, HMC can't be applied in the discrete setting due to discontinuous How do you recommend setting π and g to best estimate β? Therefore, it's quite straightforward to implement pseudo-extended HMC within Stan by As a minor comment in line 58, it would be good to state that delta is an arbitrary differentiable function. This is a good point and we've corrected this in the paper. The experiments in 4.1 and 4.2 use the RMSE error of the target variables which is quite unusual.
Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback
Moon, Seungjun, Song, Yongho, Chae, Hyungjoo, Kang, Dongjin, Kwon, Taeyoon, Ong, Kai Tzu-iunn, Hwang, Seung-won, Yeo, Jinyoung
Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these models tend to adhere to the superficial formats of feedback and provide feedback with misleading information. Hence, the focus of our work is to leverage open-source code LLMs to generate helpful feedback with correct guidance for code editing. To this end, we present Coffee, a collected dataset specifically designed for code fixing with feedback. Using this dataset, we construct CoffeePots, a framework for COde Fixing with FEEdback via Preference-Optimized Tuning and Selection. The proposed framework aims to automatically generate helpful feedback for code editing while minimizing the potential risk of superficial feedback. The combination of Coffee and CoffeePots marks a significant advancement, achieving state-of-the-art performance on HumanEvalFix benchmark. Codes and model checkpoints are publicly available at https://github.com/Lune-Blue/COFFEE.